118 research outputs found
Geometry of Policy Improvement
We investigate the geometry of optimal memoryless time independent decision
making in relation to the amount of information that the acting agent has about
the state of the system. We show that the expected long term reward, discounted
or per time step, is maximized by policies that randomize among at most
actions whenever at most world states are consistent with the agent's
observation. Moreover, we show that the expected reward per time step can be
studied in terms of the expected discounted reward. Our main tool is a
geometric version of the policy improvement lemma, which identifies a
polyhedral cone of policy changes in which the state value function increases
for all states.Comment: 8 page
Two semi-Lagrangian fast methods for Hamilton-Jacobi-Bellman equations
In this paper we apply the Fast Iterative Method (FIM) for solving general
Hamilton-Jacobi-Bellman (HJB) equations and we compare the results with an
accelerated version of the Fast Sweeping Method (FSM). We find that FIM can be
indeed used to solve HJB equations with no relevant modifications with respect
to the original algorithm proposed for the eikonal equation, and that it
overcomes FSM in many cases. Observing the evolution of the active list of
nodes for FIM, we recover another numerical validation of the arguments
recently discussed in [Cacace et al., SISC 36 (2014), A570-A587] about the
impossibility of creating local single-pass methods for HJB equations
Evolutionary game of coalition building under external pressure
We study the fragmentation-coagulation (or merging and splitting)
evolutionary control model as introduced recently by one of the authors, where
small players can form coalitions to resist to the pressure exerted by the
principal. It is a Markov chain in continuous time and the players have a
common reward to optimize. We study the behavior as grows and show that the
problem converges to a (one player) deterministic optimization problem in
continuous time, in the infinite dimensional state space
Pseudorehearsal in value function approximation
Catastrophic forgetting is of special importance in reinforcement learning,
as the data distribution is generally non-stationary over time. We study and
compare several pseudorehearsal approaches for Q-learning with function
approximation in a pole balancing task. We have found that pseudorehearsal
seems to assist learning even in such very simple problems, given proper
initialization of the rehearsal parameters
Exploring Graphs with Time Constraints by Unreliable Collections of Mobile Robots
A graph environment must be explored by a collection of mobile robots. Some
of the robots, a priori unknown, may turn out to be unreliable. The graph is
weighted and each node is assigned a deadline. The exploration is successful if
each node of the graph is visited before its deadline by a reliable robot. The
edge weight corresponds to the time needed by a robot to traverse the edge.
Given the number of robots which may crash, is it possible to design an
algorithm, which will always guarantee the exploration, independently of the
choice of the subset of unreliable robots by the adversary? We find the optimal
time, during which the graph may be explored. Our approach permits to find the
maximal number of robots, which may turn out to be unreliable, and the graph is
still guaranteed to be explored.
We concentrate on line graphs and rings, for which we give positive results.
We start with the case of the collections involving only reliable robots. We
give algorithms finding optimal times needed for exploration when the robots
are assigned to fixed initial positions as well as when such starting positions
may be determined by the algorithm. We extend our consideration to the case
when some number of robots may be unreliable. Our most surprising result is
that solving the line exploration problem with robots at given positions, which
may involve crash-faulty ones, is NP-hard. The same problem has polynomial
solutions for a ring and for the case when the initial robots' positions on the
line are arbitrary.
The exploration problem is shown to be NP-hard for star graphs, even when the
team consists of only two reliable robots
Social learning against data falsification in sensor networks
Sensor networks generate large amounts of geographically-distributed data. The conventional approach to exploit this data is to first gather it in a special node that then performs processing and inference. However, what happens if this node is destroyed, or even worst, if it is hijacked? To explore this problem, in this work we consider a smart attacker who can take control of critical nodes within the network and use them to inject false information. In order to face this critical security thread, we propose a novel scheme that enables data aggregation and decision-making over networks based on social learning, where the sensor nodes act resembling how agents make decisions in social networks. Our results suggest that social learning enables high network resilience, even when a significant portion of the nodes have been compromised by the attacker
A Semi-Lagrangian scheme for a modified version of the Hughes model for pedestrian flow
In this paper we present a Semi-Lagrangian scheme for a regularized version
of the Hughes model for pedestrian flow. Hughes originally proposed a coupled
nonlinear PDE system describing the evolution of a large pedestrian group
trying to exit a domain as fast as possible. The original model corresponds to
a system of a conservation law for the pedestrian density and an Eikonal
equation to determine the weighted distance to the exit. We consider this model
in presence of small diffusion and discuss the numerical analysis of the
proposed Semi-Lagrangian scheme. Furthermore we illustrate the effect of small
diffusion on the exit time with various numerical experiments
Building collaboration in multi-agent systems using reinforcement learning
© Springer Nature Switzerland AG 2018. This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory, either. Particles are devised with Q learning for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced experimental results are supportive to the proposed idea suggesting that a substantive collaboration can be build via proposed learning algorithm
- …